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Abstract 

We consider the problem of distributed computation of a target function over a two-user deterministic multiple- 
access channel. If the target and channel functions are matched (i.e., compute the same function), significant 
performance gains can be obtained by jointly designing the communication and computation tasks. However, 
in most situations there is mismatch between these two functions. In this work, we analyze the impact of this 
', mismatch on the performance gains achievable with joint communication and computation designs over separation- 

based designs. We show that for most pairs of target and channel functions there is no such gain, and separation 
, of communication and computation is optimal. 

(N 
X) 

• I. Introduction 

Ptn ■ 

^ ' The problem of computing a function from distributed information arises in many different contexts 
ranging from auctions and financial trading to sensor networks. In order to compute the desired target 
function, communication between the distributed users is required. If this communication takes place over 

HH a shared medium, such as in a wireless setting, the channel introduces interactions between the transmitted 
c/2 signals. This suggests the possibility to harness these signal interactions to facilitate the task of computing 

lHj the desired target function. A fundamental question is therefore whether by jointly designing encoders and 
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in 
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decoders for communication and computation, we can improve the efficiency of distributed computation. 

A. Summary of Results 

In this paper, we explore this question by considering computation of a function over a two-user 
multiple-access channel (MAC). In order to focus on the impact of the structural mismatch between the 



target and channel functions on the efficiency of computation, we ignore channel noise and consider only 
CN deterministic MACs here. More formally, the setting consists of two transmitters observing a (random) 
variable ui eU and U2 G U, respectively, and a receiver aiming to compute the function a(ui, U2) G W of 
^ these variables. The two transmitters are connected to the destination through a deterministic MAC with 
^ inputs Xi,X2 E X and output y = g{x\,X'2) E y, where g{-, ■) describes the actions of the channel. 

A straightforward achievable scheme for this problem is to separate the tasks of communication and 
computation: the transmitters communicate the values of ui and U2 to the destination, which then uses 
these values to compute the desired target function a(ui, U2). This requires the receiver to decode 2 log|W| 
message bits. However, the MAC itself also computes a function g(xi,X2) of the two inputs xi,X2, 
creating the opportunity of taking advantage of the structure of g{-,-) to calculate ■)• This is trivially 
possible when g{-,-) and a(-, •) are matched, i.e., compute the same function on their inputs. In such 
cases, performing the tasks of communication and computation jointly results in significantly fewer bits 
to be communicated. Indeed, in the matched case only the log|W| bits describing the function value are 
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recovered at the receiver. This could be considerably less than the 2 log|W| bits resulting from the separation 
approach. Naturally, in most cases the channel g{-, ■) and the target function a(-, •) are mismatched. The 
question is thus whether we can still obtain performance gains over separation in this mismatched situation. 
In other words, we ask if in general the natural computation done by the channel can be harnessed to 
help with the computation of the desired target function. 

We consider two cases: i) One-shot communication, where the MAC is used only once, but the channel 
input alphabet X and output alphabet y are allowed to vary as a function of the domain U of the target 
function. In this case, performance is measured in terms of the scaling needed for the channel alphabets 
with respect to the computation alphabets, i.e., how \y\ grow with \Vl\. This is closer to the formulation 
in the computer science literature, ii) Multi-shot communication, where the channel alphabets 1^*1, are 
of fixed size, but the channel can be used several times. In this case, performance is measured in terms of 
computation rate, i.e., how many channel uses are needed to compute the target function. This is closer 
to the formulation considered in information theory. 

As the main result of this paper, we show that separation between computation and communication is 
essentially optimal for mosij pairs {a, g) of target and channel functions. In other words, the structural 
mismatch between the functions a(-, •) and g{-, ■) is in general too strong for joint computation and 
communication designs to yield any performance gains. 

We illustrate this with an example for one-shot communication. Assume that the variables ui, U2 at 
the transmitters take on a large range of values, say \U\ = 2^°°°, and the receiver is only interested in 
knowing if ui > U2, i.e., in a binary target function. Then for most MACs and one-shot communication, 
a consequencel^of Theorems \T\ and |2] in Section UlI] (illustrated in Example |3) is that the transmitters 
need to convey the entire values of ui, U2 to the destination, which then simply compares them. Thus, 
even though the destination is interested in only a single bit about (ui, U2), it is still necessary to transmit 
21og|W| = 2000 bits over the channel. 

More generally. Theorems [1] and |2] in Section [III] together demonstrate that for most target functions 
separation of communication and computation is asymptotically optimal for most MACs. Example |4] 
illustrates that only for special functions like an equality check (i.e., checking whether ui = U2) can 
we significantly improve upon the simple separation scheme. Intuitively, this is because the structural 
mismatch between most target and channel functions is too large to allow for any possibility of direct 
computation of the target function value without resorting to recovering the user messages first. The 
technical ideas that enable these observations are based on a connection with results in extremal graph 
theory such as existence of complete subgraphs and matchings of a given size in a bipartite graph. These 
connections might be of independent interest. 

Similarly, for multi-shot communication, where we repeatedly use a fixed channel. Theorem |4] in 
Section UlI] shows that for most functions, the computation rate is necessarily as small as that for the 
identity target function describing the entire variables ui, U2 at the destination. In other words, separation 
of communication and computation is again optimal for most target and channel functions. To prove this 
result, the usual approach using cut-set bound arguments is not tight enough. Indeed, Example [5] shows 
that the ratio between the upper bound on the computation rate obtained from the cut-set bound and the 
correct scaling derived in Theorem |4]can be unbounded. Rather, the structures of the target and channel 
functions have to be analyzed jointly. 

These results show that, in general, there is little or no benefit in joint designs: computation-communica- 
tion separation is optimal for most cases. We thus advocate in this paper that separation of computation 
and communication for multiple-access channels is not just an attractive option from an implementation 
point of view, but, except for special cases, actually entails little loss in efficiency. 

'More precisely, among all target functions a(-, ■) with given domain U and range W, and all channel functions g{-, ■) with given input 
alphabet X and output alphabet y, separation is optimal except for at most an exponentially small (in domain size \U\) fraction of pairs. 

^While the theorems only present results in the limit as \1J\ — >■ oo, it follows from the proofs that for a given domain U the statements 
hold for all but an exponentially small (in \U\} fraction of channel functions. 
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B. Related Work 

The problem of distributed function computation has a rich history and has been studied in many 
different contexts. In computer science, it has been studied under the branch of communication complex- 
ity, for example see [IJ and references therein. Early seminal work by Yao [2J considered interactive 
communication between two parties. Among several other important results, the paper showed that the 
number of exchanged bits required to compute most target functions is as large as for the identity function. 
In the context of information theory, distributed function computation has been studied as an extension of 
distributed source coding in [3J-[5]. For example, Komer and Marton [3] showed that for the computation 
of the finite-field sum of correlated sources linear codes can outperform random codes. This was extended 
to large networks represented as graphs in [|6]-[|8] and references therein. Randomized gossip algorithms 
||9l have been proposed as practical schemes for information dissemination in large unreliable networks 
and were studied in the context of distributed computation in 191, 111 Oil among several others. 

In most of these works, communication channels are represented as orthogonal point-to-point links. 
When the channel itself introduces signal interaction, as is the case for a MAC, there can be a benefit from 
jointly handling the communication and computation tasks as illustrated in [fTTI . Function computation 
over MACs has been studied in lfT2l - [fT5l and references therein. 

There is some work touching on the aspect of structural mismatch between the target and the channel 
functions. In [16], an example was given in which the mismatch between a linear target function with 
integer coefficients and a linear channel function with real coefficients can significantly reduce efficiency. 
In IfTSl , it was conjectured that, for computation of finite-field addition over a real-addition channel, there 
could be a gap between the cut-set bound and the computation rate. In [T71, mismatched computation 
when the network performs linear finite-field operations was studied. To the best of our knowledge, a 
systematic study of channel and computation mismatch is initiated in this work. 

C. Organization 

The paper is organized as follows. In Section |lll we formally introduce the questions studied in this 
paper. We present the main results along with illustrative examples in Section [nil Most of the proofs are 
given in Section |Wl 

II. Problem Setting and Notation 

Throughout this paper, we use sans-serif font for random variables, e.g., u. We use bold font lower and 
upper case to denote vectors and matrices, e.g., y and G. All sets are typeset in calligraphic font, e.g., 
X. We denote by log(-) and In(-) the logarithms to the base 2 and e, respectively. 
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Fig. 1. Computation over a deterministic multiple-access channel. Each user i has access to an independent message Ui, and the receiver 
computes an estimate w of the target function a(ui, U2) of those messages. 

A discrete, memoryless, deterministic two-user MAC consists of two input alphabets Xi and X-z, an 
output alphabet y, and a deterministic channel function g: Xi x X2 ^ y. Given channel inputs xi,X2, 
the output of the MAC is 

y = 9{xi,x2). 
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Each transmitter i E {1,2} has access to an independent and uniformly distributed message Uj G Wj. The 
objective of the receiver is to compute a target function a : Wi x — ^ of the user messages, see Fig. \T\ 
Formally, each transmitter i consists of an encoder ft: Ui — )■ Xi mapping the message into the 
channel input 

Xi = /i(Ui). 

The receiver consists of a decoder (f): y U mapping the channel output y into an estimate 

w = 0(y) 

of the target function a(ui, U2). The probability of error is 

P(a(ui,U2)^0(y)). 

Remark: We point out that this differs from the ordinary communication setting, in which the decoder 
aims to recover both messages (ui, U2). Instead, in the setting here, the decoder is not interested in (ui, U2), 
but only in the value a(ui, U2) of the target function. 

In the following, it will often be convenient to represent the target function a(-, ■) and the channel g{-, ■) 
by their corresponding matrices A = (au^^uj ^ VV^^^^^ ^^id G = {gx^,x2) ^ 37-^1^-^2^ respectively. In 
other words, 

a«i,«2 = a{ui,U2) e W, 

For n G IN, denote by G®" the n-fold use of the same channel matrix G. In other words, the matrix G®" 
describes the actions of the (memory less) channel G on the sequence 

((xi[l],X2[l]), (xi[2],X2[2]), . . . , (xi[n],X2H)) 

of length n of channel inputs. 

Definition. A pair (A, G) of target and channel functions is 6-feasible, if there exist encoders /i, /2 and 
a decoder (f) computing the target function A over G with probability of error at most 6. 

Remark: We will often consider pairs (A, G*^"), in which case the definition of (5-feasibility allows for 
coding over n uses of the channel G. 

Without loss of generality, we assume that the target function A has no two identical rows or two 
identical columns, since we could otherwise simply eliminate one of them. For ease of exposition, we 
will focus on the case 

Ui=U2= U, 
X\ = X2 = X . 

To simplify notation, we assume without loss of generality that 

W = {0,l,...,t/-1}, A' = {0,1,...,X-1}, 
>V = {0,l,...,iy-l}, 3^ = {0,1,...,F-1}. 

Finally, to avoid trivial cases, we assume that all cardinalities are strictly bigger than one, and that W < U^. 

We denote by A{U,W) the collection of all target functions a: U x U ^ W. Similarly, we denote 
by Q(X,Y) the collection of all channels g: X x X y. The next example introduces several target 
functions A and channels G that will be used to illustrate results in the remainder of the paper. 

Example 1. We start by introducing four target functions a(-, ■). 
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• Let W — U xU. The identity target function is 

a(Mi,M2) = W2) 

for all Ml, M2 e W. Since we will refer to the identity target function repeatedly, we will denote it by 
the symbol A/. 

• Let W = {0, 1}. The equality target function is 



a{ui,U2) = 



1, if ui = U2 
0, otherwise 



for all ui,U2 G U. 
• Let yV — {0,1}. The greater-than target function is 



0(^1,^2) = 



1, if Ml > ll2 

0, otherwise 



for all Ml, U2 eU. 

• A random target function corresponds to the matrix A being a random variable, with each entry chosen 
independently and uniformly over W. The matrix A is generated before communication begins and 
is known at both the transmitters and at the receiver. 

We now introduce three chaimels g{-,-)- 

• Let X = {0, 1} and y = {0, 1, 2}. The binary adder MAC is given by 

g{X'i_,X2) = Xi + X2 

for all x 1,0:2 e X, and where + denotes ordinary addition. 
. Let A' = {0, 1} and = {0, 1}. The Boolean V or Boolean OR MAC is 



g{xi,x2) 



0, if Xi = a;2 = 

1, otherwise 



for all Xi,X2 e X. 

• A random chaimel corresponds to the matrix G being a random variable, with each entry chosen 
independently and uniformly over y. The matrix G is generated before communication begins and 
is known at both the transmitters and at the receiver. 



The emphasis in this paper is on the asymptotic behavior for large function domains, i.e., as f/ — )> 00. 
We allow the other cardinalities X{U), Y{U) and W{U) to scale as a function of U. We use the notation 

X{U) < U" 

for the relation 

log ^ 
limsup—j — — — < a 

and analogously for <. Similarly, we use 

X{U) > U" 

for the relation 

nm mi — - — -— — > a 
log([/) 

and analogously for >. Finally, 

X{U) = [/" 
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is short hand for 

X{U) < U" and X{U) > U". 

For example, X{U) = is equivalent to X{U) = f/"='=°(i) as f/ — )• oo. With slight abuse of notation, 
we will write X(U) < to mean that X{U) < U"^ for some finite r]. 

Throughout this paper, we are interested in efficient computation of the target function a(-, ■) over the 
channel (?(■,■)• Theorems [T] and |2] only a single use of the channel is permitted, and efficiency is 
expressed in terms of the required cardinalities X(U) and Y(U) of the channel alphabets as a function 
of U. In Theorems |3] and IH multiple uses of the channel are allowed, and efficiency is then naturally 
expressed in terms of the number of required channel uses n{U) as a function of U. 

Finally, all results are stated in terms of the fraction of channels (in Theorems \T\ and |2l) or target 
functions (in Theorem H]) for which successful computation is possible. The proofs of all the theorems are 
based on probabilistic methods by using a uniform distribution over choices of channel g{-,-) or target 
functions a(-, ■). 

in. Main Results 

Let A J E A{U,U'^) be the identity target function introduced in Example [T] and let G be an arbitrary 
channel matrix. Consider any other target function A E A{U, W) over the same domain U xU, but with 
possibly different range W. Assume (A/, G) is 5-feasible. Then {A, G) is also (5-feasible, since we can 
first compute A/ (and hence iji and LJ2) over the channel G and then simply apply the function A to the 
recovered messages lTi and '□2. This architecture, separating the computation task from the communication 
task, is illustrated in Fig. |2l 
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Fig. 2. Separation-based scheme computing the function a(-, ■) over the MAC <?(■,•)• The receiver first decodes the original messages 
(til, IJ2) and then evaluates the desired target function a{ui,\i2). 

As a concrete example, let A be the greater-than target function introduced in Example [11 The range 
W = {0, 1} of A has cardinality two. On the other hand, the identity function Aj has range U x U of 
cardinality U^. In other words, for large U, the identity target function is considerably more complicated 
than the greater-than target function. As a result, one might expect that the separation-based architecture 
in Fig. [2] is highly suboptimal in terms of the computation efficiency as described in Section [III As the 
main result of this paper, we prove that this intuition is wrong in most cases. Instead, we show that for 
most pairs (A, G) of target function and MAC, separation of computation and communication is close 
to optimal. 

We discuss the single channel-use case in Section IIII-Al and the n channel-uses case in Section IIII-B[ 



A. Single Channel Use (n = 1) 

In this section, we will focus on the case where the target function needs to be computed using just 
one use of the channel. The natural value of the upper bound on the probability of error is 5 = in this 
case. In other words, we will be interested in 0-feasibility. 

^Note that the notation f(U) is o(l) as t/ — > cxd stands for limy^oo f{U) — 0. 
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We start by deriving conditions under which computation of the identity target function over a MAC is 
feasible. Equivalently, these conditions guarantee that any target function with same domain cardinality 
U can be computed over a MAC by separating communication and computation as discussed above. 

Theorem 1. Let Aj G A{U, U^) be the identity target function, and assume 

X{U) > U, (la) 
Y{U) > U\ (lb) 

Then 

\{Ge GiX{U), Y{U)) : {Ai, G) is ^-feasible] \ _ 

\g{x{u),Y{u))\ = ^- 

The proof of Theorem [His reported in Section TV-B I Recall that Q{X, Y) is the collection of all channels 
G of dimension X x X and range of cardinality Y . Theorem [T] (together with the separation approach 
discussed earlier) thus roughly implies that any target function with a domain of cardinality U can be 
computed over most MACs of input cardinality X{U) of order at least U and output cardinality Y{U) 
of order at least U^. The precise meaning of "most" is that the fraction of channels G in Q{X,Y) for 
which the statement holds goes to one as t/ — )■ oo. A look at the proof of the theorem shows that the 
convergence to this limit is, in fact, exponentially fast. In other words, the fraction of channels for which 
the theorem fails to hold is exponentially small in the domain cardinality U . 

Since the achievable scheme is separation based, this conclusion holds regardless of the cardinality 
W{U) of the range of the target function. Similarly, since it is clear that the channel input has to have 
at least cardinality X{U) of order U for successful computation, we see that the condition on X(U) in 
Theorem [His not a significant restriction. What is significant, however, is the restriction that Y{U) is at 
least of order U^. The next result shows that this restriction on Y{U) is essentially also necessary. 

Before we state the theorem, we need to introduce one more concept. 

Definition. Consider a target function a: U x U W. For a set W C W, consider 

a-i(>V) ^ {(ui,M2) eUxU: a{ui,U2) G W}. 

For c G (0, 1/2], the target function a(-, ■) is said to be c-balanced if there exist a partition Wi, W2 of W 
such that 

\a^\W-:)\>c-U^ 

for all i G {1,2}. 

Most functions are c-balanced for any c < 1/3 and W{U) as long as U is large enough. Indeed, 
choosing Wi = {0, . . . , [W{U)/2\ - 1} and = { [W{U)/2\ W{U) - 1} shows that 

I {A G A{U, W{U)) : A is 1/3-balanced} | _ 
fl™ \AiU,WiU))\ " ^' 

where we recall that A{U, W) denotes the collection of all target functions A of dimension U x U and 
range of cardinality W. In fact, the convergence in ^ is again exponentially fasj^ in U. Moreover, many 
functions of specific interest are balanced. 

Example 2. Consider the target functions introduced in Example [Tj 

• The identity and the greater-than target functions are c-balanced for any constant c < 1/2 and U 
large enough. 

• The equality target function is not c-balanced for any constant c > as f/ — )■ 00. Indeed, since 
W{U) = 2 in this case, the only choice (up to labeling) is to set Wi = {0} and W2 = {1}. Then 
|a~^(>Vi)| = U"^ — U and |a^^(W2)| = U, which is not c-balanced for any constant c> as U — t- 00. 



''This follows directly from results on the convergence of empirical distributions. 
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(3a) 
(3b) 



The proof of Theorem |2]is reported in Section IIV-CI Recall that the notation X(U) < f/°° is used to 
indicate that X(U) grows at most polynomially in U — an assumption that is quite mild. Thus, Theorem |2] 
roughly states that regardless of the value of W{U), if the cardinality Y{U) of the channel output is 
order-wise less than U^, then any balanced target function with a range of cardinality W{U) cannot be 
computed over most MACs. Here the precise meaning of "most" is again that the fraction of channel 
matrices with at most Y(U) channel outputs for which successful computation is possible converges to 
zero, and a look at the proof reveals again that this convergence is, in fact, exponentially fast in U. 

Comparing this to Theorem [1] we see that the same scaling of Y(U) allows computation of a target 
function using a separation based scheme (i.e., by first recovering the two messages (ui,U2) and then 
applying the target function to compute the estimate w = a(ui, U2)). Thus, for the computation of a given 
balanced function over most MACs, separation of computation and communication is essentially optimal. 
Moreover, since most functions are balanced by (|2]), the same also holds for most pairs (A, G) of target 
and channel functions. 

Example 3. Let A be the greater-than target function of domain U x U introduced in Example \T\ Note 
that this target function has range of cardinality W{U) = 2, i.e., A is binary. From Example |2l we know 
that A is balanced for any constant c < 1/2 and U large enough. Thus Theorem |2] applies, showing that, 
for large U and most MACs G, separation of computation and communication is essentially optimal. 

Observe that the receiver is interested in only a single bit of information about (ui, U2). Nevertheless, 
the structure of the greater-than target function is complicated enough that, in order to recover this single 
bit, the decoder is essentially forced to learn (ui, U2) itself. In other words, in order to compute the single 
desired bit, communication of 2 \og{U) message bits is essentially necessary. (> 

Theorem |2] is restricted to balanced functions. Even though only a vanishingly small fraction of target 
functions is not balanced, it is important to understand this restriction. We illustrate this through the 
following example. 

Example 4. Assume W{U) =2 and 



The proof of the above statement is reported in Section ITV-D I This result shows that the equality function 
can be computed over a large fraction of MACs with output cardinality Y(U) of order at least U. This 
contrasts with output cardinality Y(U) of order that is required for successful computation of balanced 
functions in Theorem |2l Recall from Example |2] that the equality target function is not c-balanced for 
any c > and U large enough. Thus, © does not contradict Theorem |2] It does, however, show that for 
unbalanced functions separation of communication and computation can be suboptimal. 



X{U) > u, 
Y{U) > U. 



(4a) 
(4b) 



Let A= G A{U,2) be the equality target function introduced in Example [T] Then 

lim ^ ^(^(f^)'^(f^)) '■ (^='<^) is 0-feasible}| _ ^ 



(5) 
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B. Multiple Channel Uses (n > 1) 

In this section, we allow multiple uses of the MAC. Our emphasis will again be on the asymptotic 
behavior for large function domains U oo. However, in this section we keep the MAC g{-,-), and 
hence also the cardinalities of the channel domain X and channel range y, fixed. Instead, we characterize 
the minimum number n = n(U) of channel uses required to compute the target function. 

We begin by stating a result for the identity target function introduced in Example [D Equivalently, 
this result applies to any target function (with same domain cardinality U) by using a scheme separating 
communication and computation. Let H(x) denote the entropy of a random variable x. 

Theorem 3. Fix a constant 5 > independent of U, and assume that X and Y are constant. Let Aj G 
A{U, f/^) be the identity target function, and let G G Q{X, Y) be any MAC. Consider any joint distribution 
of the form p{q)p{xi\q)p{x2\q)p{ii\xi, X2), where X2) is specified by the channel function G. Then, 

for any n{U) satisfying 

f/2 < 2"(^)^(yl^), (6a) 
U < 2"(^)^(y|xi'q), (6b) 
U < 2«(^)^(yl><2'q), (6c) 

(A/, G®"(^)) is 5 -feasible for U large enough. 

The result follows directly from the characterization of the achievable rate region for ordinary commu- 
nication over the MAC, see for example [18, Theorem 14.3.3]. Using separation. Theorem |3] implies that, 
for large enough U, any target function of domain cardinality U can be reliably computed over n{U) uses 
of an MAC G as long as it satisfies the constraints in (|6]). The next result states that for most functions 
these restrictions on n(lJ) are essentially also necessary. 

Theorem 4. Assume tha^ 

W{U) > u{l) 

as U ^ 00, that < 5 < l/{2\n{W{U))), and that X and Y are constant. Let G G Q{X^Y) be any 
MAC. Then, for any n{U) satisfying 

I { A G A{U, W{U)) ■ (A, G®'^(^)) is 5-feasible] \ 

u^oo \A{U,W{U))\ ^ °' 

we must have 

u < 2"(^)-'^(yi^2,q) 

for some joint distribution of the form p{q)p{xi\q)p{x2\q)p{y\xi^ X2), where p{y\xi, X2) is specified by the 
channel function G. 

The proof of Theorem |4] is presented in Section ITV-EI Recall that A{U, W) denotes the collection of all 
target functions A of dimension U xU and range of cardinality W. Together, Theorems |3] and |4] thus show 
that, for any deterministic MAC and most target functions, the smallest number of channel uses n*(U) 
that enables reliable computation is of the same order as that needed for the identity function. Moreover, 
they show that for most such pairs, separation of communication and computation is essentially optimal 
even if we allow multiple uses of the channel and nonzero error probability. Here the precise meaning 
of "most" is that the statement holds for all but a vanishing fraction of functions. Moreover, the proof of 
the theorem shows again that this fraction is, in fact, exponentially small in U. 

^Note that the notation W{U) > uj{l) as t/ — )> oo stands for liniLr^oo W{U) = oo. 
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Example 5. Let G be the binary adder MAC introduced in Example [T] Define 

H'^iG) = maxH(g{xi,X2)) = 3/2, 

Xl,X2 

where the maximization is over all independent random variables xi,X2 taking values in the channel 
input alphabet X. H*{G) denotes the maximum entropy that can be induced at the channel output. For 
the binary adder MAC G, it follows from (|6]) in Theorem |3] that the identity function can be reliably 
computed over G if the number of channel uses n{U) > 2 log U /H*(G) = 4 log U/3. On the other hand, 
Theorem |4] shows that for most functions A of domain U x U and range cardinality W{U) = log(f/), the 
smallest number of channel uses n*(U) required for reliable computation is of order 41og(t/)/3. Thus, 
near-optimal performance can be achieved by separating computation and communication. In other words, 
even though the receiver is only interested in loglog(f/) function bits, it is essentially forced to learn the 
2 log(f/) message bits as well. 

This example also illustrates that the usual way of proving converse results based on the cut-set bound 
is not tight for most (A, G) pairs. For example, tMj Lemma 13] shows that for reliable computation we 
need to have 

n{U)H*{G) > H{a{ui,U2)) 

where H{-) denotes entropy. Since A has range of cardinality W{U), we have 

H{a{ui,U2))<\og{W{U)). 

For W{U) = log(f/) and H*(G) = 3/2 as considered here, the tightest bound that can in the best case 
be derived via the cut-set approach is thus 

n*{U) > \og{W{U))/H''{G) = 21oglog(f/)/3. 

However, we know that the correct scaling for n*{U) is 41og(f/)/3. Hence, the cut-set bound is loose by 
an unbounded factor as ^7 — )■ oo. 

IV. Proofs 

We now prove the main results. The proofs of Theorems [Hand |2]are reported in Sections [IV-Bl and IIV-CI 
respectively. The proof of dS]) in Example |4] is presented in Section HV-DI Finally, the proof of Theorem |3] 
is covered in Section IIV-EI We start by presenting some preliminary observations in Section IIV-AI 

A. Preliminaries 

Recall our assumption that no two rows or two columns of the target function A are identical. As a 
result, A can be computed over the channel G with zero error, i.e., (A, G) is 0-feasible, if and only 
if there exists a U x U submatrix (with ordered rows and columns) S of G such that any two entries 
{ui,U2) and {ui,U2) with a„^^„2 7^ a^^^u^ satisfy s„^^„2 7^ s^^^u^, see Fig. [3l 

On the other hand, this is not necessary if a probability of error 5 > can be tolerated. As an example, 
consider the equality function. For any positive 5, a trivial decoder that always outputs computes the 
equality function with probability of error 1/t/. As t/ — )■ 00, the probability of error is eventually less 
than 6. This motivates the following definition. 

Given a target function a: U xU ^ W, a function 05 : Vi x V2 — )• W with 

V^i = |Vi|<f/, 
V2 ^ IV2I < U 

is said to be a 5 -approximation of a(-, ■) if there exist two mappings /i : W — t- Vi and W — ?■ V2 such 
that 



|{(mi,M2) eUxU : a{ui,U2) ^ as{fi{ui), f2iu2))}\ < 5U^. 



(7) 
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Fig. 3. Structure of a zero-error computation schieme over a MAC. The target function A corresponds to the equality function, the MAC 
matrix G corresponds to the Boolean V MAC, and G®"^ denotes the 2-fold use of channel G. While the function A cannot be computed 
over G in one channel use, it can be computed in two channel uses by assigning the channel input 01 to user message and 10 to user 
message 1. The corresponding ordered submatrix S of G is indicated in bold lines in the figure. 



In words, the target function a(-, ■) is equal to the approximation function as{-,-) for at least a {1 — 5) 
fraction of all message pairs. As before, a 5- approximation function as can be represented by a V^i x V2 
matrix As. We have the following straightforward relation. 

Lemma 5. Consider a target function A and MAC G. If {A, G) is 6-feasible, then there exists a 6- 
approximation As of A such that {As, G) is ^-feasible. 

Proof: Let /i, /2 and be the encoders and the decoder achieving probability of error at most 5 for 
(A, G). Let Vi be the range of /j, and set 

as{fi{ui)j2{u2)) = (f){g{fi{ui),f2{u2))) 

for all ui,U2 E U. Then as{-, ■) is a ^-approximation of a(-, ■), and {as,g) is 0-feasible. ■ 
We will make frequent use of the Chemoff bound, which we recall here for future reference. Let 
zi, Z2, . . . , ztv be independent random variables, and let 

N 

A 



i=l 



By Markov's inequality, 

P(z > 6) < minexp(-t6) JjE(exp(tZi)) (8) 



i=l 



N 

exT)(—tb) 

t>o 

Assume furthermore that each Zj takes value in {0, 1}, and set 

/x^E(z). 

Then, for any 7 > 0, 

P(z>(1+7)a)<(^3^^)'*. 

and, for < 7 < 1, 

P (z < (1 - 7)/i) < exp(-/x7V2), (10) 
see for example [[19l Theorem 4.1, Theorem 4.2]. 
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B. Proof of Theorem \J} 

A scheme can compute the identity target function with zero error if and only if the channel output 
corresponding to any two distinct pairs of user messages is different. In what follows, we will show that 
such a scheme for computing the identity target function over any MAC G exists whenever the elements 
of G take at least X^([/) —X{U) + 1 distinct values in y. We then argue that, if the assumptions on X(U) 
and Y(U) in ([T]) are satisfied, a random MAC G (as introduced in Example [B of dimension X{U) x X{U) 
has at least X'^iU) — X{U) + 1 distinct entries with high probability as f/ — oo. Together, this will prove 
the theorem. 

Note that ([T]) implies that X > AU and Y > GAe^U^ for U large enough. We will prove the result 
under these two weaker assumptions on X{U) and Y(U). Since we can always choose to ignore part 
of the channel inputs, we may assume without loss of generality that X(U) = AU. In order to simplify 
notation, we suppress the dependence of Y = Y{U) and X = X{U) on U in the remainder of this and 
all other proofs. 

Given an arbitrary MAC G, create a bipartite graph B as follows (see Fig. IH). Let the vertices of B 
on each of the two sides of the bipartite graph be X. Now, consider a value y E y appearing in G. This 
y corresponds to a collection of vertex pairs {xi,X2) such that gxi,x2 = V- Pick exactly one arbitrary such 
vertex pair (xi, X2) and add it as an edge to B. Repeat this procedure for all values of y appearing in G. 
Thus, the total number of edges in the graph B is equal to the number of distinct entries in the channel 
matrix G. 




X 



Fig. 4. Bipartite graph B representing the channel matrix G. The left vertices correspond to the possible channel inputs at transmitter one, 
and the right vertices correspond to the possible channel inputs at transmitter two. Each edge of B corresponds to a distinct value in G. 
Thus, the number of edges in B is equal to the number of distinct channel outputs. The existence of a U x U complete subgraph Ku,u 
corresponds to the existence of two sets of channel inputs each of size U such that all corresponding channel outputs are different. Thus, 
these sets of channel inputs can be used to compute the identity target function over G with zero eiTor. 

Observe that any complete U x U bipartite subgraph Ku u of the bipartite graph B corresponds to a 
computation scheme for the identity function. Indeed, by construction each edge in B corresponds to a 
different channel output. Hence by encoding Ui and U2 as the left and right vertices, respectively, of the 
subgraph Ku^u, we can uniquely recover (ui,M2) from the channel output g(ui,U2). 

This problem of finding a bipartite subgraph Kjju in the bipartite graph B is closely related to the 
Zarankiewicz problem, see for example [20, Chapter VI]. Formally, the aim in the Zarankiewicz problem 
is to characterize Zb(n), the smallest integer m such that every bipartite graph with n vertices on each side 
and m edges contains a subgraph isomorphic to Kh^h- The K6vari-S6s-Turan theorem, see for example 
[|20l Theorem VI.2.2], states that 

Zb{n) < {b - iy/\n - b + l)n^-^/^ + {b - l)n + 1. (11) 

Using (fTTI) . we now argue that the bipartite graph B defined above contains a complete U xU bipartite 
subgraph Ku^u if the number of edges in B is at least — X + 1. By definition, B contains Ku^u if 
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there are at least Zir{X) edges in 5. By (fTTI) . 

Zu{x) <{u- 1)1/^ (X -u + + {U-1)X + 1 

Using the inequality (1 — a;)" > 1 — nx for x E [0, 1] and that X = AU by assumption, we have 

( X-U \U _ / 1 \U 

\X-U + l) ~V~ X-U + l) 

^ U _ 2U + 1 

X -U + l~ X -U + 1 

U-1 
> . 

- X 

Combining this with (fT2l) shows that 

Zu{X) <X^ -X + 1. 

Thus we have shown that the identity target function can be computed over any channel G with X = AU 
if it has at least X"^ — X + 1 distinct entries. Consider now a random channel G. The next lemma shows 
that G satisfies this condition with high probability as X — )• oo. 

Lemma 6. Let N be the number of distinct entries in the random channel matrix G, and assume Y > e^X^. 
Then 

P(N > - X + 1) > 1 - exp(-(X - 2)) 

for X large enough. 

The proof of Lemma |6] is reported in Appendix |A] Lemma |6] shows that with probability at least 
1 — exp(— (X — 2)) the identity target function can be computed with zero error over the random MAC 
G. Since X = AU so that X — t- oo as f/ — t- oo, the statement of the theorem follows. ■ 

C. Proof of Theorem |2] 

Let Wi and W2 be the two sets in the definition of a balanced function. For a MAC G, a U x U 
ordered submatrix S corresponds to a valid code for computing A with zero error only if there are no 
common values between the entries in S corresponding to a^^(Wi) and a^^(W2). Consider then a random 
G (as introduced in Example [U) and one such ordered submatrix S. Observe that the selection of rows 
and columns of G in S is fixed — the matrix S is random because its entries are derived from the random 
matrix G. Let Ni denote the number of distinct values among the entries corresponding to a^^(Wi) in S. 
We have the following bound on Ni. 

Lemma 7. Assume a"^(Wi) > cf/^, and set 

N = min{r/3,cf/V3}. 

Then 

P(Ni < X) < exp(-c[/V6). 

The proof of Lemma |7] is reported in Appendix |B] The submatrix S corresponds to a valid code for 
computing the target function A only if all the entries corresponding to a"^(>V2) take values from the 
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y — Ni channel outputs not present in the entries corresponding to a ^(Wi). Thus the probability of the 
submatrix S being a valid code for computing the target function A is at most 

P(S is a valid code for A) < P(Ni < A^) + f ^ j 

(a) 

< exp{-cUy6)+exp{-N\a-\W2)\/Y) 

< exp(-cf/V6) + exp{-NcU^/Y) 

< 2exp(-min{c[/V6,c2f/V(3F)}), (13) 

where (a) follows from Lemma |7] and 1 — x < e^^, (b) follows since A is c-balanced, and (c) follows 
from the definition of N. 

The pair (A, G) is 0-feasible if and only if there exists some valid ordered submatrix S with dimension 
U X U of the X X X channel matrix G. There are at most ways to choose the rows and columns of 
this submatrix. Hence, from the union bound and (fT3] ). the probability that (A, G) is 0-feasible is at most 

P((A, G) is 0-feasible) < X^^F{S is a valid code for A) 

< X^^ ■2exp{-mm{cUy6,c^Uy{3Y)}) 

= exp{-{mm{cUy6,c^Uy{3Y)} - 2U\n{X) - ln(2))). 

Now, ([3]) implies that X < U"^ and Y < c^U^ / (12mln{U)) for some finite positive m and U large 
enough. Hence, 



lim P((A, G) is 0-feasible) = 0, 

C/— J-oo 



as needed to be shown. 



D. Proof of © in Example |?] 

A scheme computes the equality target function A= with zero error if and only if the channel outputs 
corresponding to the set of message pairs {{u,u) : u E U} are all distinct from those corresponding 
to the message pairs {(^1,^2) : ui 7^ ^2}. In what follows, we will first show that this is guaranteed 
if the channel matrix G satisfies certain properties. We then argue that a random channel matrix G (as 
introduced in Example [U) satisfies these properties with high probability. 

From dH), we can assume that X > 200f/ln(t/) and Y > 16U for U large enough. We will prove the 
result under these two weaker conditions. Since the encoders can always choose to ignore some of the 
channel inputs, we can assume without loss of generality that X = 200f/ln(f/). Throughout this proof, 
we denote by k the largest integer such that Y/k > 16U. Note that this implies 

IQU < Y/k < 32U. (14) 

Given an arbitrary MAC G, create a bipartite graph B as follows. Let the vertices on the two sides of 
the bipartite graph correspond to the X different row and column indices of the channel matrix G. Fix 
an arbitrary subset y of cardinality k of y. Place an edge in the bipartite graph B between a node xi on 
the left and a node X2 on the right if g{xi, X2) G y, see Fig. [51 

An induced matching M in a bipartite graph i? is a subset of edges such that i) no pair of edges in M 
share a common endpoint and ii) no pair of edges in M are joined by an edge in B. Note that an induced 
matching M of size U in B corresponds to a zero-error computation scheme for the equality function 
A=. This follows from the observation that the induced matching provides a subset of channel inputs 
{xi,i,xi,2, • • • ,xi^u} C Xi and {x2,i,X2,2, • • • ,X2,u} C X2 such that the only pairs of channel inputs for 
which the channel output is in 3^ are given by {(xi fc, X2,fc) : k E {1,2, . . . , U}}. The decoder thus simply 
maps all channel outputs in 3^ to 1 and all other channel outputs to 0. 
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Fig. 5. Bipartite graph B representing the channel matrix G. The left vertices correspond to the possible channel inputs at transmitter one, 
and the right vertices correspond to the possible channel inputs at transmitter two. For some fixed subset 3^ C 3^ of cardinality k, the graph 
contains an edge between two vertices xi, X2 if the corresponding channel output g{xi,X2) is an element of y. The existence of an induced 
matching of size U corresponds to a scheme for computing the equality target function. 



A strong edge-coloring of a graph B is an edge-coloring in which every color class is an induced 
matching, i.e., any two vertices belonging to distinct edges with the same color are not adjacent. The 
strong chromatic index Xs{B) is the minimum number of colors in a strong edge-coloring of i?. A simple 
argument in [1211 shows that for any graph B, 

Xs{B) < 2A'{B), 

where A (5) is the maximum degree of B. Thus, a graph B contains an induced matching of size at least 

m{B) ^ m{B) 

XsiB) - 2A^{B) ^ ^ 

where m{B) denotes the number of edges in B. 

Consider again the bipartite graph B constructed from G for some fixed subset y. From (fT5l) and the 
above discussion, we see that (A=, G) is 0-feasible if 

2A^ - ^^^^ 

We now show that this holds with high probability for a random channel matrix G. Since we consider a 
random G, the graph B is itself also random. 

We start by deriving a lower bound on the number of edges m(B) in B. The event that a particular 
pair of vertices has an edge in B is equivalent to the corresponding entry in the channel matrix G being 
an element of 3^, which happens with probability k/Y. Since the entries of G are independent, this 
implies that the number of edges m(B) is given by a binomial random variable with mean kX'^/Y. Thus, 
using the Chemoff bound (HO]), that X = 200f/ln(f/) by assumption, and that Y/k < 32U by (fT4l) . 

P(m(B) < kXy{2Y)) < exp{-kXy{8Y)) 

< exp(-[/ln2([/)), (17) 

which converges to zero as U oo. 

We continue by deriving an upper bound on the maximum degree A(B) of B. Let Aj^(B) and A/j(B) 
denote the maximum degree among the left-side and right-side vertices, respectively. Note that Al(B) 
and A/j(B) are identically distributed, as the maximum value among X independent binomial random 
variables with mean kX/Y . Let z be one such binomial random variable. Using the Chemoff bound 

P(Al(B) < 2kX/Y) = P(Ar(B) < 2kX/Y) 

= (P(z < 2kX/Y))^ 

> (1 - 
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By the union bound, and using that X = 200f/ln(f/) by assumption, that Y/k < 32U by ([H]), and that 

e/4 < exp(— 1/3), we have 

P(A(B) > 2kX/Y) = P({Al(B) > 2kX/Y} U {Ar{B) > 2kX/Y}) 
<2(l-(l-(e/4)'=^/^)'') 

< exp(ln(400t/ln(t/)) - 200f/ ln(f/)/(3 ■ 32U)) 

= exp(ln(4001n(f/)) - 13 ln([/)/12) , (18) 

which converges to zero as U —^oo. Using Y/k > 16U by (fT4l) and the union bound, 

^, m(B) ^ \ ^/ m(B) y 
P TTT^ >U] > P( — ^-^ > 



.2A2(B) - / - V2A2(B) - 16k 

> P(^{m(B) > kXy{2Y)} n {A(B) < 2kX/Y} 

> 1 - (F{{m{B) < kXy{2Y)}) + P({A(B) > 2kX/Y}) 
which, by (fTT] ) and (fTSi) . converges to one as f/ — )• oo. Combined with (fT6l ), this shows that 



lim P((A=,G) is 0-feasible) = 1, 

U^oo 



thus proving the claim. 



E. Proof of Theorem 

Consider an arbitrary target function A and an arbitrary channel function G. Recall the definition of 
a ^-approximation function in (|7]). From Lemma |5l {A, G®") is (5-feasible only if there exists some S- 
approximation As E of A such that Vi,V2 < U and (Ag, G^^) is 0-feasible. From the construction 

in Lemma [51 we can assume without loss of generality that Vi,V2 C A"". Furthermore, we can assume 
without loss of generality that no two rows and no two columns of As are identical. Hence, (As, G®") 
is 0-feasible only if there exists a Vi x V2 ordered submatrix S of G®" computing As, as described in 
Section IIV-A[ In the following, denote by s : Vi x V2 — > 3^" the mapping corresponding to S. 

Consider now such a Vi x V2 ordered submatrix S of G*^". For any T C Vi x V2, let s(T) C 
denote the range of s(-, ■), with the arguments restricted to the subset T. Let Vi and V2 be independent 
random variables uniformly distributed over Vi and V2, respectively. Consider the random vector 

y = (y[i] • • ■ yN) = s(vi,v2). 

Then, for any y G 3^", we have 

Tot \ eVixV2: s{vi,v2) = y}\ 

F[y = y) = - 7TT-r (19) 

Vi V2 

and let Hs{y) denote the corresponding entropy of the random vector y. The next result proves the 
existence of a "typical" set. 

Lemma 8. Let S be a Vi x V2-dimensional ordered submatrix of G®", and let s : Vi x V2 — J- 3^" be the 
corresponding mapping. For any 5 > 0, there exists a set T dVi x V2 such that 

in > Y^^i^2, (20a) 
|s(r)| < 2(i+^)(2+^«(y)). (20b) 
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The proof of Lemma [8] is presented in Section ITV-Ell Consider now the event that (A, G®*^) is (5-feasible 
for the random target function A (as introduced in Example d). As we have seen before, this implies the 
existence of a 5- approximation A^ of A such that (A^, G®") is 0-feasible. Let S be the corresponding 
ordered submatrix of G*^" specifying the encoders, and let (p be the corresponding decoder. For fixed 
e > 0, let T C Vi X V2 be the typical set associated with S, as guaranteed by Lemma [8] Since (p correctly 
computes 3s{-,-) for all elements of Vi x V2, it does so in particular for all elements of T. More formally, 

as{vi,V2) = (f){s{vi,V2)) 

for all {vi, V2) G T. 

Fix an ordered submatrix S of G®" of dimension Vi x V2. Let T be the typical set corresponding to 
S. Consider a random A, and let £s be the event that there exists a 5- approximation A5 of dimension 
Vi X V2 and a mapping : s{T) — )■ W such that 

3i5ivi,V2) = (j){s{Vi,V2)) 

for all {vi,V2) G T. From the discussion in the last paragraph, we have 

P((A, G®") is 5-feasible) < P(Us^s) < ^^^^ 

s 

We continue by upper bounding the probability of the event £s- Fix a mapping and let Af denote 
the set of distinct Vi x V2 matrices As with entries in W such that 

a5{Vi,V2) = (l){s{Vi,V2)) 

for all (vi, V2) G T. Using the union bound, we have 

<Y1 P(^<5 is a ^-approximation of A). (22) 

^ AseAt 

The number of matrices in Af is at most 

where the second inequality follows from (|20al ) in Lemma [8] Since there are at most lyl^C'^)! mappings 
from s(T) to W, we have 

J2\^s\ < < exp{\n{W)2^'+'^^^+"^^^^^ +\n{W)Uy{l + £)) (23) 

where the last inequality follows from (|20bl ) in Lemma |8] 

Consider then a fixed matrix As. The next lemma upper bounds the probability that this fixed As is, 
in fact, a (5-approximation of the random target function A. 

Lemma 9. Fix < 6 < 1 — 1/W and an arbitrary matrix As of dimension Vi x V2 with Vi, V2 < U and 
range of cardinality W. Let A be the random target function of dimension U xU and range of cardinality 
W. Then 

F{As is a 5 -approximation of A) < exp(2U \n{U) — aU'^), 

with 

a = {l-6) \n{W{l-6))-{l-6). 

The proof of Lemma |9] is presented in Section IIV-E2I Combining (|22l) . (|23l) . and Lemma |9] shows that 
for any S, 

F{Ss) < exp (^ln(iy)2(^+^)(2+//s(y)) ^ 2U \n{U) -{a- ln(iy)/(l + e))U^^ . 
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Substituting the above into (|2TI) . we have 
P((A,G®") is 5-feasible) 



< exp (ln(iy)2(^+^)(2+^«(y» + 2U \n{U) -{a- \n{W)/{l + e))U^^ 
s 

< exp (2nU ln(X) + ln(W)2(^+^)(2+maxs Hs{y)) ^ 2{U + I) \n{U) - (a - \n{W)/{l + e))U^^ (24) 

where the last inequality follows by noting that there are at most ^2j^2n(7 ordered submatrices S of G®'^ 
of dimension at most U x U. 
Now, set 



e 



1 



2 

and note that e — )• as f/ — )• oo since > tj(l) as f/ — )■ oo. Recall that 



iin(iy) - r 

) as f/ — )■ oo 
5 < l/(21n(iy)) 



by assumption. This implies that 



a - = (1 - 5) ln(iy(l - 6)) -{1-6)- \n{W) + 2 

> (1-5) ln(l - (5) + (5 + 1/2 

> 1/2. 

Hence, the right-hand side of (|24l) converges to zero as [/ — )■ oo if the following two conditions hold, 

n <U, 

In particular, since W < U'^ without loss of generality, and since £ — t- as f/ — t- oo, this is the case 
whenever 

n < f/, 



Thus, if 
then either 



lim P((A,G®") is J-feasible) > 0, 



n>U, (25a) 
or there must exist a submatrix S of of G®" of dimension at most U x U such that 

jj2 ^ 2^s(y). (25b) 

Assume that the latter is true. Let Vi,V2 denote independent random variables corresponding to the 
channel inputs of the two users, as specified by the submatrix S. Then we have 

^ 2^s(«i)+^^s(y|vi) 
^ 2i°g'^+-'^s(y|vi) 



19 



which implies that 

U < 2^«(yl^i). (26) 

Similarly, we have 

If < 2^s(yl«2). (27) 
From (|25bl ). (|26] ). (ITtT ). it follows that there exists a joint distribution on x X2 x 3^" of the form 

n 

p(t?i,'y2,2/) =p(i^i) xp(w2) X JJp(2/[i]|wi[«],t;2[i]) 



1=1 



which satisfies 



^2 ^ 2^(y) < 2^^=i^(yH), 
jj ^ 2^(yi"i) < 2^^=i^(y[^ii^iW), 

We can then single-letterize the right-hand side of the above inequalities in the usual way, see for 
example the proof of [|18i Theorem 14.3.3]. Then it follows that there exists a joint distribution of the 
form p{q)p{xi\q)p{x2\q)p{y\xi, X2) such that 

U < 2"'^'^)'^(yl^i''^\ 

Thus, if (|25bl) holds, then the above inequalities provide a list of necessary conditions for (A, G®*^) to 
be (5-feasible with positive probability. It follows easily that the conditions above are also implied if the 
alternate condition in (|25al ) holds, thus concluding the proof. ■ 
It remains to prove Lemmas [8] and |9l 

1 ) Proof of Lemma ^ Consider a variable-length binary Huffman code for the random variable y 
distributed according to (fT9] ). and let £(y) be the length of the codeword associated with y. By lITSl 
Theorem 5.4.1], the expected length 

L 4 E(£(y)) 

of the code satisfies 

Hs{)l) <L< Hs{y) + 1. (28) 
Let C C s(Vi X V2) denote the set of y such that i{y) < (1 + e)L for some e > 0, and define 

T = s-\C) 

as the elements in Vi x V2 that are mapped into C. We have 

i^mi = \c\ 

'•^ 2{l+e)L+l 

< 2(i+e)(^^s(y)+2)^ 

where (a) follows since there are at most 2^^+^)^+^ binary sequences of length at most (1 + e)L and each 
of them can correspond to at most one value y E C, and (6) follows from (|28T ). 
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On the other hand, we have 

in = \ {{vi,V2) G Vi X V2 : s{vi,V2) e C}\ 

V1V2 ■ P(y eC) = V1V2 ■ P(£(y) < (1 + e)L) 



(a) 



ib) e 

> -—V,V2, 
1 + e 

where (a) follows from (fT9] ) and (b) follows from Markov's inequality. Together, this shows the existence 
of a set T with the desired properties. ■ 
2) Proof of Lemma M Fix an arbitrary function a^: Vi x V2 — t- W with Vi, V2 < U, and fix arbitrary 
maps /i : Wi — > Vi and /2 : — t- V2. Denote by z^^ ^3 the indicator variable of the event 

{a(Mi,M2) = as{fi{ui), f2iu2))}. 

In words, z^^^uj = 1 if the target function a(-, ■) is correctly approximated by as{-, ■) at U2). Since the 
entries of A are uniformly distributed over W, we have 

P(z„„„, = 1) = 1/W. 

Since the entries of A are independent, the number of message pairs for which the target function a(-, ■) 
is correctly computed using the approximation function as{-,-) is then described by a binomial random 
variable 

with mean U"^ /W . Thus the probability that for fixed maps /i, /2, the function as{-, ■) is a ^-approximation 
of the random target function a(-, ■) is given by 

P(z > (1 - 5)U^) = P(z > (1 + W{1 -6)- l)Uyw) 
(«) /exp(Vr(l -5) - l)\uVw 

< exp{—aU ), 

where (a) follows from the Chemoff bound dD and (b) from the definition of a. 

The number of possible maps /i,/2 are at most and , respectively. By the union bound, the 
probability that the function as{-,-) is a 5- approximation of the random target function a(-, ■) is then at 
most 

Vl^V^ exp{-aU^) < exp(2[/ln([/) - aU^), 
thus concluding the proof. ■ 



Appendix A 
Proof of Lemma [6]in Section [IViBJ 

We prove this result by posing it in the framework of the coupon-collector problem, see, e.g., [[T9l 
Chapter 3.6]. In each round, a collector obtains a coupon uniformly at random from a collection of Y 
coupons. Let z denote the number of rounds that are needed until the first time X"^ — X + 1 distinct 
coupons are obtained. Then the event that z is at most is equivalent to the number of distinct entries 
in the X x X random channel matrix G being at least — X + 1. 

Let 

N = X^ -X + 1. 
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For the coupon collector problem, the minimum number of rounds z needed to collect distinct coupons 
can be written as 

z = J^Zi, Zi ~ Geom(pj), Pi = 1 —, (29) 

i=l 

where the z/s are independent random variables and Geom(pj) represents the geometric distribution with 
parameter pi. Observe that pi > P2 > ■ ■ ■ > Pn- 
From the Chemoff bound dS), 



N 

P(z > X^) < minexp(-tX2) JjE(exp(tZi)) 

N 

min exp(-tX2)TTE(exp(tZi)). 

-ln(l-p]v) 



1=1 

We have ^ 

for t < — ln(l — Pi). Since the right-hand side is decreasing in pi, we have 

E(exp(iZj)) < E(exp(tz7v)) 
for every i E {1,2,..., A^}. This implies that 

F{z>X')< min exp(-tX^)( ^ . V. (30) 

0<i<-ln(l-pjv) Vl-(l-piv)ev 

Since F > e^X'^ by assumption, we have — ln(l — p^) > 2 from (|29] ). Assume that this is the case in 
the following, and set t = 2 in (l30l) . Then 



P(z > X^) < exp(-(2A- -m.{^-J^))). (3.) 



Now, since Y > e^X^, we have Pat > 1 — l/(e^X), and thus 



< ' ~ ''y,l < (32) 



l-(l-p^)e2 - l-e-i/X 
Here, the last inequality follows by setting 6 = 1/X in the inequality 

- be^-^ - 1 + be'^ > for all b e [0, 1], 

which follows from the observation that the left-hand side evaluates to zero at 6 = and is monotonically 
increasing for 6 G [0, 1]. 

Substituting (|32l ) into (|3T1) and using the definition of X yields 

P(z > X^) < exp(^-(2X2 - X(2 + 1/X))) < exp(-(X - 2)), 

thus concluding the proof. ■ 
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Appendix B 
Proof of Lemma [7] in Section HV-C I 

Consider again the coupon collector problem as in Appendix |Al and let z denote the number of rounds 
required to collect distinct coupons. Then the event that z is at least |a^^(Wi)| is equivalent to Ni 
being at most A^. Following the proof of Lemma |6l we have from the Chemoff bound ([8]) that 



P(Ni < A^) = P(z > |a-^(>Vi)|) 



< min exp(-t\a-\Wi)\)(- ^ 



0<t<-ln(l-pjv) ' ^ \ I — f^l — pj^Jgt 



< min exp(-tcf/') — , (33) 

o<t<-in(i-pjv) VI — (1— p^vjeV 

where the last inequality follows since |a^^(>Vi)| > cf/^ by assumption, and with 

Ptv = 1 - 



Y 

From the definition of A^, we have pn > 2/3 so that — ln(l — pn) > 1- Choosing t = 1/2 in (|33l) . and 
noting that 

l-(l-pjv)eV2 - l-ei/2/3 " 

we obtain 

P(Ni <N)< exp(-(cf/V2 - N)) < exp(-cf/V6), 
thus proving the lemma. ■ 
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